Power-Aware Real-Time Scheduling 
upon Identical Multiprocessor Platforms 



Vincent Nelis 12 Joel Goossens 1 Nicolas Navet 3 Raymond Devillers 1 

Dragomir Milojevic 1 

March 10, 2008 



Abstract 

In this paper, we address the power-aware scheduling of sporadic constrained-deadline 
hard real-time tasks using dynamic voltage scaling upon multiprocessor platforms. We propose 
two distinct algorithms. Our first algorithm is an off-line speed determination mechanism which 
provides an identical speed for each processor. That speed guarantees that all deadlines are 
met if the jobs are scheduled using EDF. The second algorithm is an on-line and adaptive speed 
adjustment mechanism which reduces the energy consumption while the system is running. 

1 Introduction 

1 .1 Context of the study 

Some important applications impose temporal constraints on the response time while running 
on systems with limited power resource (such as real-time communication in satellites). As a 
result, the research community has investigated during the past 15 years the low-power system 
design. Actually, the dynamic voltage scheduling (DVS) framework became a major concern 
for power-aware computer systems. This framework consists in minimizing the system energy 
consumption by adjusting the working voltage and frequency of the CPU. For real-time systems, 
this DVS framework focuses on minimizing the energy consumption while respecting all the timing 
constraints. 

Many power-constrained embedded systems are built upon multiprocessor platforms because 
of high-computational requirements and because multiprocessing often significantly simplifies the 
design. As pointed out in [4], another advantage is that multiprocessor systems are more energy 
efficient than equally powerful uniprocessor platforms, because raising the frequency of a single 
processor results in a multiplicative increase of the consumption while adding processors leads 
to an additive increase. 

1.2 Problem definition 

In the following, we consider the problem of minimizing the energy consumption needed for exe- 
cuting a set of sporadic constrained-deadline real-time tasks scheduled upon a fixed number of 
identical processors. The scheduling is preemptive and uses the global EDF policy [15]. "Global" 
scheduling algorithms, on the contrary to partitioned algorithms, allow different instances of the 
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same task (also called jobs or processes) to be executed upon different processors. Each pro- 
cess can start its execution on any processor and may migrate at run-time from one processor to 
another if it gets meanwhile preempted by smaller-deadline processes. 

We first tackle the problem of choosing the smallest (or so) processor frequency for the set 
of CPUs, such that all deadlines will be met. The procedure is performed off-line (i.e., before the 
system starts its execution) and provides a static result in the sense that the computed speed 
does not change over time. Such a static solution is sufficient to significantly reduce the energy 
consumption; however, due to the discrepancy between Worst-Case Execution Time (WCET) 
and Actual-Case Execution Time (ACET) (TTJ, it usually leads to pessimistic results. In a second 
step, we thus propose an on-line scheme that takes advantage of unused CPU slots to further 
reduce the energy consumption. 

1.3 Previous work 

There is a large number of researches about un/processor energy-aware scheduling but much 
less for the multiprocessor case, where low-power scheduling problems are often NP-hard when 
the actual applicative constraints are taken into account (see [7] for a starting point). Among the 
most interesting studies, one can cite [14] where the authors provide power-aware scheduling al- 
gorithms for bag-of-tasks applications with deadline constraints on DVS-enabled cluster systems. 
A study particularly relevant to the DVS framework is [6] which targets energy-efficient scheduling 
of periodic real-time tasks over multiple DVS processors with the considerations of power con- 
sumption due to leakage current (i.e. the static part of the energy dissipation). In [8], the authors 
propose a set of multiprocessor energy-efficient task scheduling algorithms with different task 
remapping and slack reclaiming schemes, where tasks have the same arrival time and share a 
common deadline. A large number of such "slack reclaiming" approaches have been developed 
over the years for the un/processor case. Among those, some strategies dynamically collect 
the unused computation times at the end of each job and share it among the remaining active 
jobs. Examples of algorithms following this "reclaiming" approach, include the ones proposed 
in [Tj3[TB][SIl[5]. Some reclaiming algorithms even anticipate the early completion of tasks for 
further reducing the CPU speed [16, 3], some having different levels of "aggressiveness" fj]. 

1.4 Contribution of the paper 

Unlike the work considered in [4], we study the case where the number of processors is already 
fixed. This constraint can be imposed by the availability of hardware components, by design 
considerations not related to power-consumption. Notice that in practical situations, the task 
characteristics are unknown at (hardware) design time. 

The first contribution of this paper, is based on [13], and provides a technique which deter- 
mines the minimum off-line processor speed for the fixed and identical multiprocessor platform 
using EDF. 

The second, and the main contribution of this document, is a slack reclaiming algorithm which 
is, to the best of our knowledge, the first of its kind for the global preemptive scheduling problem 
of distinct-deadlines tasks on multiprocessor platforms. This contribution can be considered as 
an extension to the multiprocessor case of a previous proposal of Shin and Shoi in [19], which is 
usually referred to as "One Task Extension" (OTE). We proved that our on-line proposal does not 
jeopardize the system feasibility. 

Organization of the paper. The document is organized as follows: in Section [2] we introduce 
our model of computation, in particular our task model; in Section [3j we present our off-line 
processor speed determination; in Section|4] we present our on-line speed reduction technique; 
in Section[5 we present our experimental results; in Section[6] we consider our future works and 
in Section y we conclude. 
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2 Model of computation 



2.1 Application model 

We consider in this paper the scheduling of sporadic constrained-deadline tasks, i.e., systems 
where each task t, = (Q,Dj,T,) is characterized by three parameters - a worst-case execution 
requirement (WCET) denoted Q, a minimal inter-arrival delay T, and a deadline D, < T, - with the 
interpretation that the task generates successive yobs t,-, ; - (with ; = 1,2,... ,00) arriving at times 
e,,y such that e iJ+ i - ey > T,, each such job has an execution requirement of at most C, execution 
units, and must be completed by its deadline noted Dy = ey + D,. We therefore assume that 
the worst-case execution time is always lower than the deadline, i.e. C, < D,. We assume that 
preemption is allowed - an executing job may be interrupted, and its execution resumed later 
(may be upon another processor), with no loss or penalty. Let z = {t 1 ,t 2/ ...,t„} denotes a 
sporadic task system. For each task t,, we define its density Ai as the ratio of its execution 

def 

requirement to its deadline: A, = Q/D,-. Since C, < D, we have that A, < 1. We also define the 
total density A sum (T) of sporadic task system t as A sum (T) = f Ef=i A,, and its maximal density as 

def 

Amax(T) = max T;eT A,. Without loss of generality, we assume in the remainder of the paper that 

Ai > A 2 > . . . > A„, and consequently A max (x) = Ai. 

2.2 Platform model 

In our platform model, a processor can dynamically adapt its working frequency in some contin- 
uous range [fmm,fmax]- The case where the number of frequencies is finite can be addressed as 
in fJ2]. In the remainder of this paper, we denote by s(f) the processor speed at any time-instant f. 
The processor speed s(f) is defined as the ratio of its current functioning frequency (say /(£)) over 

the maximal frequency / max , i.e.: s(f) = f j^, with / min < /(f) < / max . Notice that the processor 

speed always lies between and 1 , whatever the values of f min and / max , and to each speed 
corresponds exactly one frequency. 

We consider in this document multiprocessor platforms composed of a known and fixed num- 
ber m of identical processors {Pi,p2, ■ ■ ■ ^m) upon which a set of real-time tasks is scheduled. 
The working power of each processor may be characterized by its speed (or computing capacity) 
s - with the interpretation that a job that executes on a processor of speed s for R time units 
completes sxR units of execution. The minimal and maximal admissible speed of all processors 

are identical and are denoted by s min = f ¥^ > and s max d = Ip^ = respectively. Since we 

/max /max 

assume that the range of available frequencies is continuous between / min and / max , the speed 
of the processors can take any real value between s min and s max at every instant. Notice that the 
task computing requirements (Q's) are defined for the maximal speed s max . 

In Section [3] we assume that all the processors share a common speed which is fixed before 
the system starts its execution. This speed does not change during the scheduling and thus, 
we will use the notation s instead ofs(t) to simplify the presentation. Then, we study the case in 
Section|4]where each processor may run at a different speed and may change it at any time during 
the scheduling. In our work, speed assignments are determined at job-level: voltage/speed 
changes only occur at job dispatching instants. That is, once a job is assigned to a CPU, the 
CPU speed is fixed until the job is preempted or completed. 

3 Off-line speed determination 

3.1 Introduction 

Off-line processor speed determination is the process of determining, during the design of the 
real-time application, the lowest processor speed s in order to schedule the sporadic task set % 
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upon an identical multiprocessor platform with m processors running at speed s. In this Section, 
we consider the case where, at any instant, all processors must be running at the same speed 
noted s. We shall use the following result: 

Theorem 1 (Bertogna, Cirinei and Lipari [5]). Any sporadic constrained-deadline task system % 
satisfying 

Asum(T) < m - (m - 1) • A max (x) 
is schedulable by the EDF algorithm upon a platform with m identical processors. 
Then, we get the following sufficient feasibility condition: 

Corollary 1 . A sporadic constrained-deadline task system % is EDF-schedulable upon an identi- 
cal multiprocessor platform with m processors running at speed s if: 

, / \ , A sum (z) — A max (T) 
s > A max (T) + (1 ) 

m 

Notice that, from the expression (T} (which is a sufficient condition), s is always greater or 
equal to A max (T), which is a necessarily condition to ensure the system schedulability, whatever 
the scheduling algorithm. 



3.2 Algorithm EDF W 

Following an idea from [13], but adapted to our off-line speed determination where the number of 
processors is fixed, we shall present an improvement on the speed needed in order to schedule 
sporadic task sets. 



Algorithm EDF (fc) (Goossens, Funk and Baruah |13|): Assuming that the task indexes are 
sorted by non-increasing order of task densities and 1 < k < m, EDF^ assigns priorities to jobs 
of tasks in t according to the following rules: 

For all i < k, tau t jobs are assigned the highest priority (ties are broken arbitrarily). 

For all i > k, z, jobs are assigned priorities according to EDF (ties are again broken arbitrarily). 

That is, Algorithm EDF^ assigns the highest priority to jobs generated by the (k-1) tasks in % 
that have highest densities, and assigns priorities according to deadlines to jobs generated by all 
other tasks in t (thus, "pure" EDF is EDF (1) ). We show in the following that we get another lower- 
bound for the speed s when using EDF (,:) instead of EDF, and this bound is always lower than (or 
equal to) the one provided by Expression (T). But first, we introduce the notation t w to refer to 

the task system composed of the (n - i + 1) minimum-density tasks in t: t (!) = f {t„t, + i,...,t„}; 
(according to this notation, t = t (1) ). 

Theorem 2. Any sporadic constrained-deadline task system x is EDF^ -schedulable upon an 
identical multiprocessor platform with m processors at speed s k ifs k > maxjAi, A k + As ^f + + P } 

Corollary 2. A sporadic constrained-deadline task system x is schedulable upon m processors 
at speed s \ by EDF*-^, with 

s o] = maxfA^mmjAi + — — 2 

k=i m-k + 1 

and I is the parameter minimizing the speed s i of s k . 

Proof. The proof is a direct consequence of Theorem |2j □ 
It may be seen that this expression always yields a better bound than Inequality QJ. 
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3.3 Implementation 



A more detailed description of our off-line speed determination mechanism is given by Algo- 
rithm [T| Let s i denote the returned speed, defined by Expression {2}. Before applying this 
algorithm, we assume that the number of processors is sufficient to schedule the system t at the 
maximal speed. Consequently, the speed s i is initially set to s max (line 3). Then, the algorithm 
searches the minimal speed by sweeping the value of k between 1 and m (line 4 to line 13). 
Finally, in order that EDF W assigns the highest priorities to the (k - 1) tasks that have highest 
densities, we set the deadline of these tasks to -oo (line 14). 



Algorithm 1: Off-line speed determination 



Input: t, m, s max , s mm 
Output: s i 
1 begin 

^opt := l; 

Sol := Smax , 

sumit : = maxjSmn^A]) ; 

for (k := 1 ; k < tn and s Q i > si imit ; k := k + 1) do 

s-maxtAiM^+ ^ifi"' ) • 
if (s < s ol ) then 

s l := s I 
^opt : — k. j 
|_ if ( s oi < siimit) then Soi : = siimit ; 

foreach t; e m^-^kapt-i) do D ; := -oo ; 
return (s i) ; 
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4 Multiprocessor One Task Extension 
4.1 Introduction 

In this section, we consider the case where processors still share the same minimal and maximal 
speeds s min and s max , but each one may run at its own execution speed during the scheduling. 
We assume that, when a processor is idle, its execution speed is always fixed to the minimal 
common speed s min . We propose a low-complexity on-line algorithm that aims to further reduce 
the speeds of the CPUs by performing "local" adjustments, when it is safe to reduce the speed 
below s i defined by Equation |2). 

We term our technique MOTE for Multiprocessor One Task Extension, since it is a multipro- 
cessor version of the technique proposed in [19] and usually referred to as OTE. The idea is the 
following: the speed of a CPU can safely be reduced below the speed s i during the execution of 
a job if the reduced speed does not change anything with respect to the schedule of the subse- 
quent jobs scheduled on that CPU. More precisely, subsequent jobs will not be delayed by more 
(nor less) higher-priority workload than with s i. 



4.2 Notations 

We denote by t the current time in the schedule and by B,(f) the last release time of t, before 
or at time t, with B,-(0) initially set to -T, (see Equation |3]to understand this initialization). During 
the scheduling, B;(f) is updated at each time t a job is released by t,. The ready queue, denoted 
by ready-Q, holds all the pending jobs (i.e. ready to be executed but waiting for a CPU) sorted 
according to the EDF (fc) rule, where ties are broken according to an arbitrary rule; recall that using 
EDF W , the priorities of the jobs are constant. In the following, s, denotes the processor speed for 
the job t,j at time t. We shall use the following functions. 
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The function 3\i{t,t') indicates if the sporadic task t, may generate a job at time V > t. Since 
T, denotes the minimal inter-arrival delay between job releases of the sporadic task t,, we get: 



I *l*»M + T < (3) 

v ' 10 otherwise v ' 

Notice that B,(0) is initially set to -T, in order to have ^1,(0,0) = 1 since our task model considers 
that each task may release its first job at time t = 0. 

Then, the function PotAct,(f,£') (for Potentially Active at time £') indicates if n has an active 
job at time t which may still be active at time V . This function returns 1 only if n is active at time t 
and if f is not larger than the deadline of this job: 

1 if co s /(t) > Oand 
PotAct,(f, t') d = \ f < V < B,-(f) + D, 
otherwise 

where a>*'(t) denotes the remaining worst-case execution requirement of the last released job of 
t, if executed at speed s, (if a job is done, its co is set to zero, even if the WCET is not exhausted). 

Theorem 3. The function 

U(T UtV ,U') d =m- Fot\cti(t,t')-YMi(t,t'), 

T,Et\{t U ) T,ET 

if non-negative, provides a lower bound of the number of available CPUs at time V > t, when 
ignoring the schedule of the current job of t„ (if any). 

Corollary 3. At each time t where a job t u - is allocated to CPU ft, the earliest future time 
instant in the schedule such that P e may be required by another job (possibly from the same 
task) is given by: 



J min{f > 1 1 n(T u , D/ 1, f) < 0} ifm < n 
next 1 +oo otherwise 



4.3 MOTE scheme 

EDF^ is a job-level fixed-priority consequently a job executed on a CPU can only be preempted 
upon its completion or the release of a (higher priority) job. In our scheme, the speed reduction 
of a job is decided when the job is allocated to a CPU, for the first time or when it resumes 
after being preempted. Upon its release, a job is inserted into the ready-Q if it cannot receive 
a processor (i.e. all processors are used and the job is of lower priority). We do not make any 
assumptions on the CPU allocation rule when several CPUs are available for a single job. For 
instance, free CPUs can be granted according to the rule "smaller CPU index first." 

Since we consider multiprocessor platforms, we know that we have to be very careful to any 
change in the original schedule because of scheduling anomalies. We say that a scheduling 
algorithm suffers from anomalies if a change which is intuitively positive in a schedulable system 
can turn it unschedulable. An "intuitively positive change" is a change which seems to help the 
scheduling, like reducing the density of a task (by increasing its period or reducing its execution 
requirement) or advancing the start-time of a job; this can also be an increase of the number 
of processors on the platform. Unfortunately, multiprocessor platforms are subject to schedul- 
ing anomalies [2j. For that reason, our on-line low-power mechanism only focuses on the last 
allocated-job and avoids to change the schedule of the other jobs. 

Figure [T] illustrates the main idea of our on-line algorithm when 3 tasks are scheduled upon 
3 processors at speed s i. This example shows a schedule where t is the current time, t u , t 2/ i 
and t 3/ i are the active jobs at time t (the ready-queue is empty since there are only three tasks 
in the system) and plain circles and vertical arrows represent the deadlines and the (earliest) 
arrival times (since tasks are sporadic) of each task, respectively. Suppose that and ti /2 are 
allocated to V\ and f 2 - Before allocating t 3(1 to the processor P 3 , we see that V 3 cannot be 
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Figure 1 : Illustration of a 3-task system. 



required by another job than t 3/J until time t next . Indeed, t 1/2 and t 2/ 2 could be assigned (if they 
arrive at time A\ i2 and A 2i i) to the CPUs f \ and f 2 since the system feasibility ensures that ti,i 
and t 2 ,i will be completed by their deadline. Consequently, when ignoring the schedule of t 3/ i, we 
see that f nex t is the earliest time instant (after the time f) such that all processors may be required. 
Indeed, f nex t is the earliest time instant after time f such that n(T 3/ i, t, f next ) = 3- 0- 3 = 0. 

Since t n ext is the earliest time instant (after the current time t) such that P 3 may be required by 
another job than t 3 4 (assuming that all the other active jobs are scheduled on other processors), 
one can conclude that V 3 will only execute the job t 3 ,i between time instants t and £ next . That 
is, we proved that f 3 can modify its working speed in such a way that t 3/ i completes in the 
worst-case at time min{D 3j i, f next } (or earlier if s„„„ imposes it). 

Principle: Our on-line power-aware algorithm deals with a priority rule that assigns a constant 
priority to each job. In this work, these priorities are determined by the algorithm EDF (i) . Our 
power-aware algorithm is only applied when a job t (j/ is to be allocated to a CPU Vt at time 
f during the scheduling, which corresponds to its arrival or to the completion of a higher priority 
job. At this time, our method determines the earliest time instant t next such that f t may be needed 
by another job. The function n(T; <; -,f,f' ) (based on the deadlines of the jobs currently executing) 
is used to sweep the task set (with a running time linear in the number of tasks). Notice that 
the function n(Ty,f,f) could be evaluated only at the deadline-times of the jobs currently under 
execution and at the next (possible) arrival-time of every task (since between these instants, the 
function n(Ty,f,f) is constant). It follows from Corollary [3] that Vt will not execute another job 
than Ty until the time instant f next . The speed for t,j can De safely reduced in such a way that 
it completes at time min{D,,y, £ next } (if the corresponding speed is lower than the current one). 
Obviously, the working speed of a processor can never be reduced under s min . 



Algorithm 2: Determination of f nex t 


Input: t, Tj 


Output: U 


begin 




n a -.= number of active tasks at time f ; 




L 


-.= set of the next deadline and possible arrival-time of each task, sorted by increasing order of the 




occurring time ; 




^next := ^, 




n := m - (n„ - 1); 




while (n > and L * ct>) do 






e ^ L.top(); 






(next := e.occurring.time ; 






if (e.task t n) and (e.type == deadline) then n := n + 1; 






else if (e.type == arrival) then n := n - 1; 






_ L-PopO ; 




return t next ; 


end 





Let s, denote the processor speed of the active job t,, ; . This speed s, is initialized when t,, ; is 
released. In a simple version of the MOTE technique, the execution speed of every released job 
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Algorithm 3: Speed-allocation to Ty at time t 



Input: Ty 

Output: 

begin 

// Initialization step 

if (Ty /s allocated for the first time) then 
if fi < fc,) then s ; := A,-; 

elses = At i A ^ (k+1) 1 ■ 

// MOTE step 

if (m < n) then f next := Call Algorithm2(f, t,) ; 
else f n ext ~ °° ; 
if (tnext > t) then 

s.:=mints,- > m|nlD ,, tnB<t| _ t ) ; 

if < SnunJ then s ; := s min ; 

t/j is allocated to any available CPUs ; 

The speed of the designated CPU is fixed to s, ; 

else No speed reduction can occur. The EDF W rule applies; t,j either preempts the lowest priority job 
currently under execution or is allocated to any available CPU, and the processor speed is fixed to s*. ; 

end 



is initially set to s i, since we assume that the priorities are assigned by EDF<- k) and we proved 
that the system feasibility is guarantee when it is scheduled by EDF W at speed s D i (Theorem [2}. 
However, we adopt here another initialization step in order to profit from the individual speed of 
each processor. In this "optimized" initialization step, two cases may arise at the arrival of the job 

Hi- 

1 . if %i e (t \ t w ) (the set of the (k - 1) tasks with highest densities), s, is fixed to A,. 

2. if t; e t®, s; is fixed to A k + 

We proved that all deadlines are met when the system is scheduled while using this rule. Then, 
when the job t (J is to be allocated to a CPU during the scheduling, we determine the earliest time 
instant t next such that TUx^, t, f next ) < and if f next > t, one has: 

. f ofi(t)- Si } 

Si := min is,, : \ (4) 

[ min \Di,j, f ne xt} - t J 

We proved also that the system feasibility is not jeopardized by this speed modification. 



4.4 Implementation 

Before the system starts its execution, our algorithm computes the speed s i by determining the 
optimal value of k thanks to Equation |2j (see Algorithm [T|. Then, while the system is running, 
there is only one kind of situation where the decision to reduce or not the CPU speed for a job 

is taken: when it is allocated to an available CPU (upon its release, or when it is waiting for 
an available processor at the head of the ready-Q and a job terminates its execution). A detailed 
description of the applied procedure at any allocation time is given in Algorithm [3] Algorithm [2] 
shows how to compute f ne xt with a linearithmic (also called quasilinear) worst-case computing 
complexity 0{n ■ log(n)), where n is the number of tasks. 

It worth noting that the MOTE step (see Algorithm [3} is applied at most once to each job (and 
only if i > k); indeed, a job whose speed has been changed by this step will not be preempted 
in the future and thus will not be (re-)stored in the ready-Q before its end of execution. However, 
when the speed of a job (with a normal priority) is initialized but not modified by the MOTE step 
at its arrival, it can possibly be reduced by the MOTE step in the future, if the job is at the head of 
the ready-Q and another job completes its execution. Section [5]shows that the MOTE algorithm 
indeed significantly improves the energy consumption of a real-time sporadic system. 
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5 Experiments 



5.1 Introduction 

In our simulations, we have scheduled periodic constrained-deadline systems (i.e., T, is here the 
exact inter-arrival delay for each task t,). The energy consumption of each generated system 
is computed by simulating the three methods described in this paper during one hyper-period 
(i.e. the least common multiple of the task periods); indeed, the authors of [9] show that, for the 
specific case of synchronous periodic task systems, the schedule repeats from the origin with 
a period equals to the hyper-period. The three methods are: the off-line speed reduction for 
EDF (Equation |TJ), the off-line speed reduction for EDF (,c) (Equation |||) and the MOTE algo- 
rithm (combined with EDF (fc) ). The energy consumptions generated by these three methods are 
compared with the consumption by the S max method (i.e. all jobs are executed at the maximal 
processors speed s max ), while using different processor models. During our simulations, about 
5000 constrained-deadline systems were generated and simulated; with the number of tasks n in 
[5,40] (with density below 1 and A sum (T) between 1 and 10). During each simulation, the ACET 
of each job was generated using a pseudo-random generator. We made many graphics from 
our results, but they are omitted here due to space limitation. To ensure that the number m of 
processors is sufficient to schedule the generated systems at speed s max , m is determined by the 
following Equation (from FT51 ): 



m : = mm < n, 



(t) - A max (T) 
1 - A max (T) 



5.2 Processor models 

In our experiments, we used two realistic processor models. These models, noted P1 and P2 in 
the following, are derived from the processor Crusoe TM5400 from Transmeta and the processor 
StrongARM SA-1 1 00 from Intel, respectively. In these two processor models, the voltage can only 
vary in a limited range. Moreover, only a fixed number of functioning frequencies/voltages are 
available. For that reason, we use the available processor speed immediately above the desired 
one, if the latter is not available. Note that the use of the two adjacent frequencies to the requested 
frequency is more efficient from an energy point of view (see, for instance, [1 2]). Table[T|(adopted 
from [17] and [20]) summarizes the relationship between frequency, voltage, power consumption 
and the corresponding speed for the Transmeta TM5400 (P1 ) and the StrongARM SA-1 1 00 (P2). 



CPU 


Freq. (MHz) 


Volt. (V) 


Power (%) 


Speed 




700 


1.65 


100 


1 




600 


1.60 


80.59 


0.857 


P1 


500 


1.50 


59.03 


0.714 




400 


1.40 


41.14 


0.571 




300 


1.25 


24.60 


0.429 




200 


1.10 


12.70 


0.286 




206 


1.50 


100 


1 




195 


1.42 


78.9 


0.947 




180 


1.30 


63.2 


0.874 




165 


1.20 


50.0 


0.801 




150 


1.15 


39.9 


0.728 


P2 


135 


1.10 


33.6 


0.655 




120 


1.08 


33.0 


0.583 




105 


0.95 


19.8 


0.510 




90 


0.90 


15.0 


0.437 




75 


0.82 


11.8 


0.364 




60 


0.80 


9.44 


0.291 



Table 1 : Processors characteristics. 
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Tables [2] provides the average consumption profit generated by each method (expressed in 
percent), compared to the consumption using the S max method over the entire simulation. 



results with the StrongARM SA-1100 processor 


Method name 


Power saving over S max 


Standard deviation 


offline EDF 


4.33 % 


3.34 


offline EDF« 


27.12% 


10.24 


MOTE 


44.74 % 


8.82 




results with the Crusoe processor 


Method name 


Power saving over S max 


Standard deviation 


offline EDF 


0.62 % 


0.76 


offline EDF« 


5.91 % 


4.38 


MOTE 


23.3 % 


7.55 



Table 2: Simulation results. 



5.3 Observations 

We observe a large variation in the power saving of our algorithms when they are simulated 
upon the Crusoe processor and upon the StrongARM SA-1100. This variation is due to the 
difference in the shape of their consumption function: the consumption function of the StrongARM 
processor has a higher curvature than the Crusoe processor. That is, a speed reduction in 
the StrongARM implies a more significant reduction of the system energy consumption. This 
reduction is therefore even more significant when we use the standard dynamic consumption 
model where the power consumption function is modeled as a constant plus a cubic function (or 
at least a quadratic function) of the speed [22]. However, our results for this theoretical case are 
omitted due to the space limitation. 

According to [18], the Crusoe processor performs a speed transition less than 20 ps. This 
time overhead is negligible for most real-time systems, since the order of magnitude of the task 
characteristics is about few milliseconds. With the Strong ARM SA-1100 processor, Pouwelse 
et al. [17] report that a voltage/speed change can be performed in less than 140 ^s. If this may 
not be considered as negligible, since we have at most two speed transitions for each job (one 
initially and one for a MOTE step), the "voltage change overheads" can be incorporated into the 
worst-case execution requirement. 

6 Future works 

Currently this work addresses the impact of the proposed scheduling algorithms only on the dy- 
namic power component of the overall microprocessor power dissipation. Proposed methods do 
not take into account the power dissipated to hold the circuit state and/or power dissipation due 
to the imperfections of the physical implementation (static power dissipation component). How- 
ever it is a very well known fact that for integrated circuits manufactured with technologies below 
130 nm, and especially with current 90 nm and 65 nm technologies, the static power dissipation 
component becomes very important and comparable to the dynamic power dissipation [TO]. A 
significant research effort has been provided, and is still deployed on the static power dissipa- 
tion reduction techniques. Proposed methods target not only low-level, hardware actions (such 
as clock gating) but also higher-level (operating system) actions forcing the processor to enter 
one of the multiple low-power dissipation modes for better trade-off between power saving and 
wake-up time (see [1] as an example). The problem of the increased static power dissipation of 
the sub-micron technologies is the main motivation for our future work, in which we will extend 
the existing controllable parameters of our scheduling algorithms (voltage and frequency) with a 
processor switch-off parameter. 
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7 Conclusion 



In this paper, we proposed two approaches which reduce the energy consumption for real-time 
systems implemented upon multiprocessor platforms. The first one is an adaptation of the first 
proposal "Global EDF", called EDF W , which allows a lower computing speed of the processors 
than EDF. The second proposal (called MOTE) is an on-line low-power algorithm which takes into 
account the "unused" CPU times to adjust the processor speeds while the system is running. We 
show in our experiments that this on-line technique can significantly improve the processors en- 
ergy consumption (up to 45% for the Intel StrongARM SA-1 100). Moreover, our MOTE technique 
can incorporate the speed/voltage change overheads by simply adding the speed transition time 
of the processors to the worst-case workload of each task. Our two methods address sporadic 
constrained-deadline real-time systems. This model includes the most popular one: the sporadic 
and implicit-deadline task systems. The complexity of each decision (at any job allocation-time) is 
linear in the number of ready jobs in the system. This low-complexity makes the MOTE strategy 
a very mighty technique. 
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